Search CORE

123 research outputs found

Analysis of relative influence of nodes in directed networks

Author: G. Caldarelli
Hiroshi Kori
Naoki Masuda
P. Berkhin
R. P. Agaev
S. Wasserman
Yoji Kawamura
Publication venue: 'American Physical Society (APS)'
Publication date: 19/10/2009
Field of study

Many complex networks are described by directed links; in such networks, a link represents, for example, the control of one node over the other node or unidirectional information flows. Some centrality measures are used to determine the relative importance of nodes specifically in directed networks. We analyze such a centrality measure called the influence. The influence represents the importance of nodes in various dynamics such as synchronization, evolutionary dynamics, random walk, and social dynamics. We analytically calculate the influence in various networks, including directed multipartite networks and a directed version of the Watts-Strogatz small-world network. The global properties of networks such as hierarchy and position of shortcuts, rather than local properties of the nodes, such as the degree, are shown to be the chief determinants of the influence of nodes in many cases. The developed method is also applicable to the calculation of the PageRank. We also numerically show that in a coupled oscillator system, the threshold for entrainment by a pacemaker is low when the pacemaker is placed on influential nodes. For a type of random network, the analytically derived threshold is approximately equal to the inverse of the influence. We numerically show that this relationship also holds true in a random scale-free network and a neural network.Comment: 9 figure

arXiv.org e-Print Archive

Crossref

JAMSTEC Repository

A Bayesian approach to the estimation of maps between riemannian manifolds

Author: A. Bloch
B. Levit
B. Levit
B. Levit
J. Jost
L. T. Butler
P. Berkhin
R. Montgomery
S. Amari
Z. Landsman
Publication venue: 'Allerton Press'
Publication date: 25/03/2008
Field of study

Let \Theta be a smooth compact oriented manifold without boundary, embedded in a euclidean space and let \gamma be a smooth map \Theta into a riemannian manifold \Lambda. An unknown state \theta \in \Theta is observed via X=\theta+\epsilon \xi where \epsilon>0 is a small parameter and \xi is a white Gaussian noise. For a given smooth prior on \Theta and smooth estimator g of the map \gamma we derive a second-order asymptotic expansion for the related Bayesian risk. The calculation involves the geometry of the underlying spaces \Theta and \Lambda, in particular, the integration-by-parts formula. Using this result, a second-order minimax estimator of \gamma is found based on the modern theory of harmonic maps and hypo-elliptic differential operators.Comment: 20 pages, no figures published version includes correction to eq.s 31, 41, 4

arXiv.org e-Print Archive

Crossref

Parameterized Algorithms for Graph Partitioning Problems

Author: A Donavalli
AB Kahng
C Komusiewicz
H Shachnai
J Guo
J Kneis
L Cai
L Cai
N Bourgeois
P Berkhin
RG Downey
RG Downey
É Bonnet
Publication venue
Publication date: 01/03/2014
Field of study

We study a broad class of graph partitioning problems, where each problem is specified by a graph

G=(V,E)

, and parameters

k

and

p

. We seek a subset

U\subseteq V

of size

k

, such that

\alpha_1m_1 + \alpha_2m_2

is at most (or at least)

p

, where

\alpha_1,\alpha_2\in\mathbb{R}

are constants defining the problem, and

m_1, m_2

are the cardinalities of the edge sets having both endpoints, and exactly one endpoint, in

U

, respectively. This class of fixed cardinality graph partitioning problems (FGPP) encompasses Max

(k,n-k)

-Cut, Min

k

-Vertex Cover,

k

-Densest Subgraph, and

k

-Sparsest Subgraph. Our main result is an

O^*(4^{k+o(k)}\Delta^k)

algorithm for any problem in this class, where

\Delta \geq 1

is the maximum degree in the input graph. This resolves an open question posed by Bonnet et al. [IPEC 2013]. We obtain faster algorithms for certain subclasses of FGPPs, parameterized by

p

, or by

(k+p)

. In particular, we give an

O^*(4^{p+o(p)})

time algorithm for Max

(k,n-k)

-Cut, thus improving significantly the best known

O^*(p^p)

time algorithm

arXiv.org e-Print Archive

CiteSeerX

Crossref

Cluster Editing: Kernelization based on Edge Cuts

Author: A. Zuylen van
J. Chen
J. Dean
J. Flum
J. Gramm
J. Guo
M. Charikar
M. Tedder
M.R. Fellows
N. Ailon
N. Alon
N. Bansal
P. Berkhin
R. Niedermeier
R. Shamir
R.G. Downey
R.H. Möhring
S. Böcker
W.H. Cunningham
Z.-Z. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Kernelization algorithms for the {\sc cluster editing} problem have been a popular topic in the recent research in parameterized computation. Thus far most kernelization algorithms for this problem are based on the concept of {\it critical cliques}. In this paper, we present new observations and new techniques for the study of kernelization algorithms for the {\sc cluster editing} problem. Our techniques are based on the study of the relationship between {\sc cluster editing} and graph edge-cuts. As an application, we present an

{\cal O}(n^2)

-time algorithm that constructs a

2k

kernel for the {\it weighted} version of the {\sc cluster editing} problem. Our result meets the best kernel size for the unweighted version for the {\sc cluster editing} problem, and significantly improves the previous best kernel of quadratic size for the weighted version of the problem

arXiv.org e-Print Archive

Crossref

BotTrack: Tracking Botnets Using NetFlow and PageRank

Author: B. Wang
I. Arce
M.F. Kaashoek
M.P. Collins
P. Berkhin
P. Maymounkov
R. Hund
S. Wang
T. Fawcett
X. Xu
Y. Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Dynamical SimRank search on time-varying networks

Author: AD Sarma
B Bahmani
D Fogaras
D Lizorkin
Julie A. McCann
M Jiang
P Berkhin
W Tao
W Yu
W Yu
Weiren Yu
Wenjie Zhang
Xuemin Lin
Y Shao
Y Sun
Z Li
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838–849, 2015) and READS (Zhang et al. in PVLDB 10(5):601–612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.’s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding (Formula presented.) time and (Formula presented.) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update (Formula presented.) in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring (Formula presented.) time and (Formula presented.) memory in the worst case to update (Formula presented.) pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the “affected areas” of (Formula presented.) to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to (Formula presented.), where m is the number of edges in the old graph, and (Formula presented.) is the size of “affected areas” in (Formula presented.), and in practice, (Formula presented.). (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles “similar sink edges” simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just (Formula presented.) memory, without the need to store all (Formula presented.) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs

arXiv.org e-Print Archive

Crossref

Aston Publications Explorer

Warwick Research Archives Portal Repository

Spiral - Imperial College Digital Repository

Proximity curves for potential-based clustering

Author: A Jamalizadeh
AF Kip
AS Ramsey
Attila Csenki
C Henning
D Xu
Daniel Neagu
Denis Torgunov
J Li
JG Ecker
JL Casti
L Susskind
Natasha Micic
P Berkhin
R Fisher
SS Rao
WE Wright
WE Wright
WH Marlow
Y Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/01/2020
Field of study

YesThe concept of proximity curve and a new algorithm are proposed for obtaining clusters in a finite set of data points in the finite dimensional Euclidean space. Each point is endowed with a potential constructed by means of a multi-dimensional Cauchy density, contributing to an overall anisotropic potential function. Guided by the steepest descent algorithm, the data points are successively visited and removed one by one, and at each stage the overall potential is updated and the magnitude of its local gradient is calculated. The result is a finite sequence of tuples, the proximity curve, whose pattern is analysed to give rise to a deterministic clustering. The finite set of all such proximity curves in conjunction with a simulation study of their distribution results in a probabilistic clustering represented by a distribution on the set of dendrograms. A two-dimensional synthetic data set is used to illustrate the proposed potential-based clustering idea. It is shown that the results achieved are plausible since both the ‘geographic distribution’ of data points as well as the ‘topographic features’ imposed by the potential function are well reflected in the suggested clustering. Experiments using the Iris data set are conducted for validation purposes on classification and clustering benchmark data. The results are consistent with the proposed theoretical framework and data properties, and open new approaches and applications to consider data processing from different perspectives and interpret data attributes contribution to patterns

Crossref

Bradford Scholars

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

Generalized alignment-based trace clustering of process behavior

Author: A Cheng
D Ferreira
F Taymouri
G Greco
H Ponce-de-León
IA Stewart
J Carmona
J Engelfriet
JC Dunn
JD Weerdt
M Dumas
M Song
N Eén
P Berkhin
PJ Rousseeuw
RPJC Bose
T Chatain
T Murata
TF Gonzalez
WMP Aalst van der
X Lu
X Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Process mining techniques use event logs containing real process executions in order to mine, align and extend process models. The partition of an event log into trace variants facilitates the understanding and analysis of traces, so it is a common pre-processing in process mining environments. Trace clustering automates this partition; traditionally it has been applied without taking into consideration the availability of a process model. In this paper we extend our previous work on process model based trace clustering, by allowing cluster centroids to have a complex structure, that can range from a partial order, down to a subnet of the initial process model. This way, the new clustering framework presented in this paper is able to cluster together traces that are distant only due to concurrency or loop constructs in process models. We show the complexity analysis of the different instantiations of the trace clustering framework, and have implemented it in a prototype tool that has been tested on different datasets.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Commonality Preserving Multiple Instance Clustering Based on Diverse Density

Author: A Jain
AP Dempster
BJ Frey
D Huttenlocher
DG Lowe
E Forgy
GN Lance
H Bay
IH Witten
J Ward
L Kaufman
P Berkhin
Q Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/11/2015
Field of study

Abstract. Image-set clustering is a problem decomposing a given im-age set into disjoint subsets satisfying specied criteria. For single vector image representations, proximity or similarity criterion is widely applied, i.e., proximal or similar images form a cluster. Recent trend of the im-age description, however, is the local feature based, i.e., an image is described by multiple local features, e.g., SIFT, SURF, and so on. In this description, which criterion should be employed for the clustering? As an answer to this question, this paper presents an image-set clus-tering method based on commonality, that is, images preserving strong commonality (coherent local features) form a cluster. In this criterion, image variations that do not affect common features are harmless. In the case of face images, hair-style changes and partial occlusions by glasses may not affect the cluster formation. We dened four commonality mea-sures based on Diverse Density, that are used in agglomerative clustering. Through comparative experiments, we conrmed that two of our meth-ods perform better than other methods examined in the experiments.

CiteSeerX

Crossref